Method and system for conducting a full text search on a client system by a server system

ABSTRACT

A full text search involving an index of a string of characters on a client for use on a server. The client searches for data and file information to share and creates a character string containing the information. This string is transformed using the Burrows-Wheeler method. A rotation matrix is created and the last column compressed before transmission. The server decompresses the data, reverses the transformation and creates a suffix array. The string and suffix array are stored. A second client search can be conducted of the suffix array. The server sends the second client a list of located information. A message may then be directed between the second and first clients without server involvement. Each client on the server will have the string and suffix array stored in the server until it signs off. The server has a dynamic index of data available for transfer between clients.

RELATION PRIOR FILED APPLICATIONS

This application claims the benefit of the filing of U.S. ProvisionalPatent Application Ser. No.: 60/194,428 filed Apr. 4, 2000.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to systems and methods forconducting computer based text searching. The present invention relatesmore specifically to systems and methods for carrying out a text searchfrom a server computer system on data and file information located on aclient computer system.

2. Background Information

The Internet comprises a vast number of computers and computer networksthat are interconnected through communication links. The interconnectedcomputers exchange information using various services such as electronicmail, Gopher, FTP, and the World Wide Web. All of these technologiesrequire a level of knowledge that is greater than that possessed by theaverage Internet user who might want to share information. Additionally,the Internet is in a constant state of fluctuation. FTP servers and websites come and go. A person searching for a particular file using a webbrowser and any of the popular search engines (Lycos®, Yahoo®, AltaVista®, etc.) can expect to have mixed to poor results because of suchfactors as stale links, ratio or account requirements on FTP sites, aswell as unknown bandwidth availability on a given site.

Typically when a user wishes to share some of his files, he designs aweb site and/or acquires and sets up an FTP server. Either of thesetasks requires more expertise than the average Internet user possesses,therefore, much of what could be shared on the Internet is not.

When a user decides to look for a file on the Internet, he willtypically use his web browser to contact a search engine. Since majorsearch engines face the daunting task of trying to index every singleweb page and/or FTP site, the information they return will necessarilybe aged and incomplete. Often a search of FTP servers will yield thelocation of a file and whether the FTP server will be online. The ownerof the FTP site, however, will typically have further requirements, suchas a user account, or he may require users to upload files before hewill allow the user to download anything.

What is needed is an efficient method of creating a dynamic andconstantly updated index of that information available on the Internetso that when a person conducts a search and locates information, theperson knows that the information is immediately available.

SUMMARY OF THE INVENTION

In view of the above, the present invention is advantageous in that itprovides a dynamic and constantly updated searchable index ofinformation that is available on the Internet. To accomplish this, thedisclosed invention provides a suffix array search system that allowsthe rapid searching of large amounts of information from large numbersof users while minimizing the required amount of bandwidth andminimizing the amount of utilized server system resources. The result isthat the present invention enables a person searching the Internet toquickly locate and transfer available information.

The present invention may be summarized as a system and method forconducting a full text search on a client system by creating a full textsearch index of a string of characters on a client system for use on aserver system. When a client system signs on to a server system, theclient's system searches for relevant data and file information aboutthat data which the user is willing to share and creates an originalstring of characters that contains file information such as file name,location, and size. The original string of characters is transformedusing the Burrows-Wheeler transformation method. In the transformation,a rotation matrix is created of the original string of characters andthe last column of the matrix is compressed using a standard compressionmethod before being transmitted to the server system. The server systemdecompresses the data using the same standard decompression method. Thetransformation of the file information is reversed to recover theoriginal string of characters. While recovering the original string ofcharacters, a suffix array is created. The original string of charactersand suffix array are stored in the memory of the server system. A binarysearch can be conducted of the suffix array to efficiently locate anysub-string of characters within the original string of characters.

A second client system signing on to the server system can initiate asearch of the memory of the server system for a selected sub-string ofcharacters. Once the selected sub-string of characters is found, theserver system sends the second client system a list of the locatedrelevant information (filename, location, size, user IP, user port,etc.). If the second user wants to obtain a copy of the data, a messageis sent directly between the second client system and the first clientsystem without the server system being involved unless the first clientsystem is behind a firewall. When the first client system is behind afirewall, the request for the file is relayed through the server system.The requested data will then be transferred from the first client systemto the second client system.

Each client system willing to share data that is signed on to the serversystem will have the original string of characters and the suffix arraycreated for the client system and stored in the server system memoryonly while the client system is signed on to the server system. As soonas the client system signs off the server system, that client system'soriginal string of characters and suffix array are deleted from theserver system. This creation of a client system's original string ofcharacters and suffix array only while the client system is signed onthe server system enables the server system to have a dynamic andconstantly updated index of data, which is available for transferbetween client systems.

Other objects and advantages of the present invention will becomeapparent from the following description of the preferred embodiment withreference to the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow chart illustrating an embodiment of the method of thepresent invention.

FIG. 2 is a high level architectural drawing illustrating the primarycomponents of a system that operates in accordance with the presentinvention.

FIG. 3 is a high level architectural drawing illustrating the primarycomponents of the present invention illustrating a search for data inmultiple search servers.

FIG. 4 is a flow chart illustrating steps in the method of the presentinvention carried out on a client system and a server system.

FIG. 5 is a flow chart illustrating the search process for data onsearch servers in the method of the present invention.

FIG. 6 is a screen image for a client system illustrating the searchentry mechanism and search results in one embodiment of the presentinvention.

FIG. 7 is a flow chart further illustrating aspects of a registrationprocess.

FIG. 8 is a flow chart further illustrating aspects of a search process.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The present invention is a method and system for conducting a full textsearch on a client system by creating a full text search index of astring of characters on a client system for use on a server system. Thegeneral operation of the system as a whole is described in FIG. 1. Aclient software program (hereinafter “Client 1”) is used to identifywhich files a user wishes to share with other users. Client 1 willusually access the Internet and log into a registration server whichassigns Client 1 to a search server (Step 10). The search server may beone of many search servers available to the registration server. Client1 creates an original string of characters, commonly referred to as atext file, consisting of the name, location, size, and other fileinformation of each file the user wishes to share (Step 12). Forsearching purposes, each upper case character in the text file isconverted to a lower case character. The original string of charactersis rearranged by Client 1 using the Burrows-Wheeler transformationmethod (Step 14). In the transformation, a rotation matrix is created ofthe original string of characters and the last column of the matrix iscompressed (Step 16) using conventional compressing techniques (e.g.,run length encoding, move to front encoding, order-0 adaptive arithmeticencoding, etc). Client 1 then transmits the compressed file informationto the search server (Step 18). If character case retention is desired,the original string of characters, after being converted to lower case,may be appended with a string of bytes, each bit of which represents theoriginal case of each character of the original text. This bitinformation is then available to the search server to restore the propercase of characters when returning search results. With such caseretention, the rotation matrix is created out of a lower case renditionof the original text with the case bits appended to the end of thestring.

The Burrows-Wheeler Transform method is a data compression algorithmdeveloped by M. Burrows and D. J. Wheeler which transforms a block ofdata into a format that is well suited for compression. A detaileddescription of this method may be found in M. Burrows and D. J. Wheeler,“A Block-sorting Lossless Data Compression Algorithm”, SRC ResearchReport, May 10, 1994; and Nelson, Mark, “Data Compression with theBurrows-Wheeler Transform”, Dr. Dobb's Journal, September 1996. Each ofthese articles are hereby incorporated by reference in their entirety asif they were completely re-written herein.

The search server receives the compressed file (Step 20), validates it(Step 22), and decompresses it (Step 24) using the reciprocalconventional decompression techniques. During the process of restoringeach character in the Burrows-Wheeler Transform to its original positionin the original string of characters, the character's position is notedand the first character of the rotation it represents is represented ina suffix array (Step 26). The novel and non-obvious creation of thesuffix array is used for searching purposes (Step 28) which is describedin more detail below.

Reference is now made to FIG. 2 for a brief description of a systemarchitecture appropriate for implementing the methods of the presentinvention. The elements of the system shown in FIG. 2 are typical ofclient and server systems that make up a part of the Internet. Clientsystems 30 and 32 represent two of many such typical client servers.Client system 30 represents a system that originally implements themethod of the present invention to identify files that are willing to beshared. The initial process (A) comprises logging in with a registrationserver 34. In this manner a record set of client system 30 shared filesis stored in process (B) in a text file library database 36. Thecompressed file is further sent in process (B) to a corresponding searchserver 40.

An inquiry search initiated by client system 30 in process (D) wouldinitiate the methods described above to identify the text string througha first assigned search server 38 and then through other search servers40 and 42 in process (E) if necessary. A search result is returned toclient system 30 in process (F) which may provide for a directcommunication in process (G) between client system 30 and a secondclient system 32 where the searched for data may be located.

FIG. 3 represents the system architecture evident when multiple searchservers are required in a data search. Here client system 50communicates directly with an assigned search server system 52 which inturn acts as a client to communicate with a number of additional searchservers 54, 56 and 58. Each of the search server systems 52, 54, 56 and58 are in communication with client library databases 60, 62, 64 and 66respectively. The client libraries found on databases 60, 62, 64 and 66contain the record sets of user shared file identifiers for registeredclients associated with those search server systems, 52, 54, 56 and 58.

FIG. 4 provides an alternate description of the initial client“registration” process described above with respect to FIG. 1. Thearrangement of process steps shown in FIG. 2 is separated between theclient system (step group 84 on the left in the figure) and the assignedserver system (step group 98 on the right in the figure). The initialaction of the client to login to a search server (Step 70) is respondedto by the search server by receiving the client login (Step 86). Thefollowing steps of gathering the data on the client system (Step 72),creating the original string of characters (Step 74), applying theBurroughs Wheeler Transform (Step 76), compressing the data (Step 78)and transmitting the data (Step 80) are all carried out on the clientsystem. It is understood that the software code necessary to implementthese steps as described in more detail above and below, has beenprovided to the client system, typically through an appropriate downloadof the software from a central server location. The transmission of thecompressed data (Step 80) completes the processing (Step 82) at theclient system. The search server picks up the process by receiving thecompressed data (Step 88), decompressing the data (Step 90), applyingthe reverse Burroughs Wheeler Transform (Step 92) and creating theSuffix Array (also Step 92). The search server then stores the SuffixArray and the original string of characters (Step 94) which now makesthem available for searching (Step 96) while having utilized onlyminimal server resources.

Each client system willing to share data which is signed on to theserver system will have the original string of characters and the suffixarray created by the server system for the client system and stored inthe server system memory while the client system is signed on to theserver system. As soon as the client system signs off the server system,that client system's original string of characters and suffix array aredeleted from the server system. This creation of a client system'soriginal string of characters and suffix array only while the clientsystem is signed on the server system enables the server system to havea dynamic and constantly updated index of data, which is available fortransfer between client systems.

A second client software program (Client 2) signing on to the serversystem can initiate a search of the memory of the server system for aselected sub-string of characters. Client 2's search request isconverted to lower case. A binary search is performed on each of thesuffix arrays in the search server memory to rapidly determine if therequested data exists in any of the libraries stored on the searchserver. If the requested substring of characters is identified, thenthat client's (Client 1) Internet address, user IP, user port, filelocation, filename (with case restored), file size, etc. are sent toClient 2 by the search server. If the second user wants to obtain a copyof the data, a message is sent directly between Client 2 and Client 1without the server system being involved unless the Client 1 is behind afirewall. In this case the request for the file is relayed through theserver system. The requested data will then be directly transferred fromClient 1 to Client 2.

If the requested sub-string of characters is not found in Client 1'slibrary, the other libraries contained on the search server aresearched. If Client 2's requested sub-string of characters still hasn'tbeen filled, the search server then acts as a client, requesting thefile from as many of the other search servers associated with theregistration server as are readily available. Once the sub-string ofcharacters is found on a client's shared library, the search server thenrelays that client's Internet address, user IP, user port, filelocation, filename (with case restored), file size, etc. to Client 2.

The suffix array search method and system of the present inventionallows the rapid and efficient searching of large amounts of informationfrom large numbers of users while minimizing the required amount ofbandwidth and conservatively using server resources. The preferredembodiment involves creating a set of information on a client computersystem using the client computer system's computing resources, and thentransporting the information to a central server where the informationcan be efficiently searched. In order to search a large block of textefficiently an index is created. The suffix array of the presentinvention is an array of all the suffixes of a string in lexicographicalcreated order to be able to perform a binary search. A binary search isan algorithm to search such an array. The search begins with an intervalcovering the whole array. If the search value is less than the item inthe middle of the interval, the search narrows the interval to the lowerhalf. Otherwise the search narrows the interval to the upper half. Thesearch repeatedly checks for the sub-string of information until thesub-string of characters is found or the interval is empty. This suffixarray is a compact and desirable structure for such searching purposes.

EXAMPLE 1

The following is an example of a suffix array using the word “Bananas”as the original string of characters. Since the word “Bananas” has 7characters it therefore has 7 possible suffixes. The ‘^(˜)’ is a specialcharacter marking the end of the string.

Index for each character in “bananas”:

TABLE 1 Character b a n a n a s ^(˜) Index 0 1 2 3 4 5 6 7

A list of the possible suffixes in lexicographic order with the specificsuffix number is shown:

TABLE 2 Suffix Index ananas 1 anas 3 as 5 bananas 0 nanas 2 nas 4 s 6 ˜7

The actual suffix array is the series of numbers (1,3,5,0,2,4,6,7) whichrepresents the character index of each possible suffix of the originalstring of characters: “bananas^(˜)”. Bananas is indexed as “0” since itis the original string of characters. By taking away the letter “b”, thefirst possible suffix is created as “ananas” and is indexed as “1”. Bytaking away the letters “ba”, the second possible suffix is created as“nanas” and is indexed as “2”. By taking away the letters “ban”, thethird possible suffix is created as “anas” and is indexed as “3”. Thisis repeated for the entire length of the original string of characters.The example could be expanded to include a string of characters of anylength, for example, a user's list of all files that he is sharing. Itis important to note that each of the suffixes does not have to berewritten, all that is now required to store is the original string ofcharacters, and the list of pointers. A very efficient binary searchthrough a large string of characters to determine if there is a match isnow available.

EXAMPLE 2

The following provides an example of the sort rotations step of theBurrows-Wheeler Transform using the original string of characters of thefirst example, i.e., “Bananas”.

A block of “N” (N=8 in this example) characters “S” (S=bananas^(˜) inthis example) is organized in a conceptual N-by-N matrix whose elementsare characters and whose rows are the rotations (cyclic shifts) of S,sorted in lexicographical order. This example uses the word “bananas” asthe original string of characters. This could be any length string. Amatrix of characters is formed whose rows are cyclic shifts of thesubject string, sorted in lexicographic order starting with the firstcolumn “(F)”.

TABLE 3 Row (F)irst (L)ast 0 a n a n a s ^(˜) b 1 a n a s ^(˜) b a n 2 as ^(˜) b a n a n 3 b a n a n a s ^(˜) 4 n a n a s ^(˜) a a 5 n a s ^(˜)b a ^(˜) a 6 s ^(˜) b a n a n a 7 ^(˜) b a n a n a s

The second step of the transformation is to find the last characters ofthe rotations which are located in the last column in Table 3 above,under “(L)”. The first column of sorted characters (F) results in“aaabnns^(˜)” and the last column of sorted characters (L) results in“bnn^(˜)aaas”. The last column (L) is the transformed data that will becompressed, “bnn^(˜)aaas”.

In comparing the sorted rotations in Example 2 with the suffix array inExample 1 a similarity is noted. If each string in Example 2 istruncated at the end marker ‘^(˜)’ the suffix array is identical to thesorted rotations of the Burrows-Wheeler Transform.

At this point conventional compression techniques are applied to thelast column of data “bnn^(˜)aaas” such as run length encoding, move tofront encoding, order-0 adaptive arithmetic encoding, or similartechniques. The data is then transmitted to the server system where thedata is decompressed using the reciprocal technique.

By applying the Burrows-Wheeler Transform the Table 3 shown in Example 2can be recreated knowing only the contents of the last column (L),“bnn^(˜)aaas”, and the position of the original string of characters (3in this example “bananas”). These are given to the search server by theclient to recreate the first column (F). A sort of all of the charactersfrom the last column (L) “bnn^(˜)aaas” is conducted resulting in alexicographic list which in this example is “aaabnns^(˜)”. A list ofpredecessor characters is then built using the last column (L)“bnn^(˜)aaas” and determining how many predecessor characters exist inthe sorted lexicographic list “aaabnns˜”. This results in:

‘b’ has 3 predecessors “aaa”, ‘n’ has 4 predecessors “aaab”, ‘n’ has 5predecessors “aaabn”, ‘˜’ has 7 predecessors “aaabnns”, ‘a’ has 0predecessors “—”, ‘a’ has 1 predecessor “a”, ‘a’ has 2 predecessors“aa”, and ‘s’ has 6 predecessors “aaabnn”.

This list of values (3,4,5,7,0,1,2,6) are transformation vectors “T” forrestoring the original string of characters.

T 3470126 Vector 01234567

The transformation vector list can be listed in numeric order resultingin:

T 01234567 Vector 45601273 Last Column String bnn˜aaas

These lists are then used to recreate the original string of characterson the server system, as well as the suffix array that will be used forbinary searches.

Given that position 3 of L represents the end point of the originalstring, the transformation vector for position 3 is a zero, whichindicates that the next (first) character in the original string is thecharacter ‘b’ that occupies position 0 in the L (bnn^(˜)aaas) list.Since the suffix represented by this ‘b’ (‘bananas’) occupies position 3of the sorted matrix, we know that the correct character index forposition 3 of the suffix array is 0. The transformation vector that isin position 0 is 4, indicating that the character ‘a’ in position 4 ofthe L (bnn^(˜)aaas) list is the next character of the original string,resulting in ‘ba’. Since the suffix represented by this ‘a’ (‘ananas’)occupies position 0 of the sorted matrix, we know that the correctcharacter index for position 0 of the suffix array is 1. Thetransformation vector that is in position 4 is 1, indicating that thecharacter ‘n’ in position 1 of the L (bnn^(˜)aaas) list is the nextcharacter of the original string, resulting in ‘ban’. Since the suffixrepresented by this ‘n’ (‘nanas’) occupies position 4 of the sortedmatrix, we know that the correct character index for position 4 of thesuffix array is 2. The transformation vector that is in position 1 is 5,indicating that the character ‘a’ in position 5 of the L (bnn^(˜)aaas)list is the next character of the original string, resulting in ‘bana’.Since the suffix represented by this ‘a’ (‘anas’) occupies position 1 ofthe sorted matrix, we know that the correct character index for position1 of the suffix array is 3. The transformation vector that is inposition 5 is 2, indicating that the character ‘n’ in position 2 of theL (bnn^(˜)aaas) list is the next character of the original string,resulting in ‘banan’. Since the suffix represented by this ‘n’ (‘nas’)occupies position 5 of the sorted matrix, we know that the correctcharacter index for position 5 of the suffix array is 4. Thetransformation vector that is in position 2 is 6, indicating that thecharacter ‘a’ in position 6 of the L (bnn^(˜)aaas) list is the nextcharacter of the original string, resulting in ‘banana’. Since thesuffix represented by this ‘a’ (‘as’) occupies position 2 of the sortedmatrix, we know that the correct character index for position 2 of thesuffix array is 5. The transformation vector that is in position 6 is 7,indicating that the character ‘s’ in position 7 of the L (bnn^(˜)aaas)list is the next character of the original string, resulting in‘bananas’. Since the suffix represented by this ‘s’ (‘s’) occupiesposition 6 of the sorted matrix, we know that the correct characterindex for position 6 of the suffix array is 6. The transformation vectorthat is in position 7 is 3, indicating that we have reached the end ofthe string. Since the suffix represented by this ‘^(˜)’ (‘^(˜)’)occupies position 7 of the sorted matrix, we know that the correctcharacter index for position 7 of the suffix array is 7.

While recreating the original string of characters, at each step as thesystem vectors through the transformed data, the current position iskept track of and the index of each character in the suffix array isrecorded as shown below. This process of recreating the original stringof characters and creating the suffix array is shown as:

The given starting position is 3, Suffix Array [3]=0

T[3]=0; L[0]=‘b’, which is stored in position 0 of the restored string.Suffix Array [0]=1;

T[0]=4; L[4]=‘a’, which is stored in position 1 of the restored string.Suffix Array [4]=2;

T[4]=1; L[1]=‘n’, which is stored in position 2 of the restored string.Suffix Array [1]=3;

T[1]=5; L[5]=‘a’, which is stored in position 3 of the restored string.Suffix Array [5]=4;

T[5]=2; L[2]=‘n’, which is stored in position 4 of the restored string.Suffix Array [2]=5;

T[2]=6; L[6]=‘a’, which is stored in position 5 of the restored string.Suffix Array [6]=6;

T[6]=7; L[7]=‘s’, which is stored in position 9 of the restored string.Suffix Array [7]=7;

The inventive step of keeping track of the current position as thesystem vectors through the transformed data and recording the index ofeach character in the suffix array creates a suffix array list. Thissuffix array list is resorted in lexicographic order, which in the aboveexample extracts the following sorted suffix array (1, 3, 5, 0, 2, 4, 6,7). The sorted suffix array is the index in the preferred embodiment forrapidly and efficiently searching the original string “bananas”. Thesearch server stores the original string of characters and sorted suffixarray in memory.

EXAMPLE 3

The following is an example of how a binary search can be used todetermine whether or not a string of characters occurs within the string“bananas^(˜)”. Given the character string:

TABLE 4 Character b a n a n a s ^(˜) Index 0 1 2 3 4 5 6 7

And the suffix array:

TABLE 5 Element Suffix Represented Index 0 ananas 1 1 anas 3 2 as 5 3bananas 0 4 nanas 2 5 nas 4 6 s 6 7 ˜ 7

If searching for the character combination ‘as’ the search begins withan interval covering the whole array of 8 characters. The middle elementis number 3 (8/2=4; the 4th element is 3). The value of element 3 of thesuffix array is 0. Element 0 of the original string is the ‘b’ whichrepresents the suffix ‘bananas’. Since the length of the search value‘as’ is 2, only the first two characters of ‘bananas’ are compared.Since ‘as’ is lexicographically less than ‘ba’, the second half of theinterval is ruled out and the first half of the full interval is usedwhich results in the second interval as being 4 (8/2=4). The middleelement of the second interval is 1 (4/2=2; 3−2=1) which represents thesuffix ‘anas’. Since ‘as’ is lexicographically greater than ‘an’ thefirst half of the second interval is ruled out and the second half ofthe second interval is used which results in the third interval as being2 (4/2=2). The middle element of the third interval is 2 (2/2=1; 1+1=2)which represents the suffix ‘as’, which is the string being searchedfor.

If the search for a character combination is for ‘ax’ instead of ‘as’the searching steps would be the same except the last step. Since ‘ax’is lexicographically greater than ‘as’ the first half of the thirdinterval is ruled out and the second half of the third interval is usedwhich results in and the fourth interval as being 1 (2/2=1), whichindicates that there is no match. With these examples completed theprocess is now well described.

Heretofore, all previous uses of the Burrows-Wheeler Transform (BWT)centered on the benefits it brought to the compression of data. The BWTrearranges the order of the characters in a block of input text to makeit easy to compress with simple algorithms. This transformation isreversible. A complete sorting of each suffix of the original block oftext is required during compression. During the decompression stage itis possible to efficiently build an index of each suffix of the originalblock of text. This suffix array can then be used to perform binarysearches for any string of characters in the text. The method of thepreferred embodiment implementing the Burrows-Wheeler Transform has thefollowing steps:

To compress:

Sort rotations

Find last characters of rotations

To decompress:

Find first characters of rotations

Build list of predecessor characters

Record Suffix Array while forming output string

To conduct a search for certain file information, the client systemsubmits a list of words that the user wants to search. The user maystructure the search results by limiting the file extensions and limitthe number of returned files. For file extensions the user may selectall file extensions or limit it to specific file extensions. The searchserver arranges these words in order of obscurity, with the leastfrequently occurring word searched first. It performs a binary searchthrough each library for the most obscure word. In each case where thereis a match, the search server further qualifies each record foradditional words from the list of words to searched for. If the searchserver is unable to return an adequate list of records to the client, itthen impersonates a client, requesting the file from as many of theother search servers as are readily available. The search server thenrelays the resulting client identification, Internet address, port, andfile location, filename file size to the client system. The user canthen decide if he wants to get a copy of any of the search results. Ifthe decision is to get a copy, the client system will contact the otherclient system with the available files and obtain a copy directly fromthat client system.

FIG. 5 illustrates in general the process whereby a search is initiatedand carried out according to the methods of the present invention. Theprocess shown in FIG. 5 would follow from the “registration” processdescribed and shown in FIGS. 1 and 4. The search effort is initiated ata client system (Step 100) through the user entering the key words forthe search, specifying the file extensions to be considered (.mp3, .wav,etc.) and the maximum number of results to be returned (Step 102). Whenthe user selects the “search” button within the software running at theclient system (Step 104) the search message is sent from the clientsystem to an assigned search server system. The assigned search serverreceives the search request (Step 106) and initiates the performance ofthe search according to the methods described above. As search resultsare returned within the search server it continuously queries whetherthe results returned are greater than or equal to the maximum requestedby the client user (Step 108). If so, the search results returned thusfar are delivered to the client (Step 112) and the process ends (Step116). If the maximum has not been reached then the assigned searchserver queries whether all servers available to the assigned searchserver have been searched (Step 110). If so, then again the resultsreturned thus far are delivered to the client (Step 112) and the processends (Step 116). If additional search servers are available and themaximum number of results have not been reached then the assigned searchserver acts as a client (Step 114) and sends the search request to theadditional available search servers. These additional search servers actas the assigned server and process and perform the search (Step 106),continuing until either the maximum results are returned (Step 108) orall available search servers have been searched (Step 110). What resultshave been returned are then communicated to the client server (Step 112)as indicated above, which finally terminates the search process (Step116).

Reference is now made to FIGS. 7 and 8 for a brief description ofalternative refinements to the methods described above with regard tothe registration process (FIG. 7) and the search process (FIG. 8). InFIG. 7 the registration process begins as described above at Step 130with the client logging into the registration server with theappropriate user ID and password. As a response to the login messagefrom the client, the registration server sends the IP addresses andports of the available and most appropriate search server anddecompression server. The objective in this refined preferred embodimentis to delegate the task of decompressing the stored data and suffixarray to a separate server, allowing the search server to devote maximumresources to the search process. In FIG. 7, steps are carried out bysearch server 148 while steps are being carried out by decompressionserver 150. In Step 132, the client is connected to the assigned searchserver and sends the initial user data information. The search serverreceives the initial user data from the specified client. In Step 134the search server creates a user object and adds it to the list ofactive users connected to the server.

While the server carries out the registration of initial user data,decompression server 150 receives, at Step 136, the compressed data ofthe library listing created by the client. At Step 138 the decompressionserver decompresses the library and creates a suffix array and actualfile listing from the data. At Step 140 the decompression server thenchecks the name of the search server that the client has been assignedto and is presumably connected to. The decompression server checks forthe connection on the active list of search servers it is connected to.Once it finds the appropriate search server, it sends the suffix arrayand the actual file listing (Step 142).

At Step 144 in FIG. 7, the search server once again picks up the processby receiving the suffix array and actual file listing from thedecompression server and stores the data in conjunction with theappropriate user object (Step 146). If the search server cannot find theuser object for this user in the user list, then it frees the data itreceives from the decompression server and ignores the message.

FIG. 8 discloses a refined embodiment of the search process of thepresent invention, again with the objective of relieving the searchserver of some noncritical tasks. In the search process begun at Step160, the client sends a search request to the search server. Searchserver 180 then receives the search request at Step 162 and performs thesearch in its library listing. If it does not find enough search results(Step 164) it determines if it is the main search server the client isconnected to (Step 166). If so, it identifies other available searchservers at Step 168 and sends that information along with any searchresults it has found, to the client at Step 170. If the search server atStep 164 has returned enough results, it immediately skips to Step 170wherein it returns the results to the client. If insufficient resultshave been returned and the search server is not the main search serverfor the client, the search server recognizes that it is one of a numberof secondary servers and simply returns the results it has found to theclient at Step 170.

Upon receiving results from various search servers, the clientdetermines at Step 172 whether the results are obtained from the mainsearch server. If so, it checks at Step 176 to see if there was a listof search servers sent with the search results. It then connects tothose search servers, back at Step 160 and sends the same search requestto them. If the results received by the client are not from the mainsearch server that the client was connected to, the search is stillvalidated and either discarded or the results are added to other validsearch results at Step 174.

FIG. 6 provides an example of a display screen presented to the user ata client server after the above process of carrying out a search isaccomplished. In the screen view shown the substance of the searchcarried out is displayed near the top of the screen providinginformation on the text searched (banana in this example), any fileextension limitations (none in this example), and the maximum number ofresults to return (200 in this example). The search button whichinitiates the search action is also indicated to the user.

The lower section of the screen display shown in FIG. 6 comprises atable of returned search results. This table of results first lists thelogged-in client users that retain files (data) that meet the searchcriteria. Second, this list identifies the file names associated withthe client users that specifically were identified. In some cases morethat one file is identified for a given client user. It is often thename of the file that allows the searching client user to determinewhether the specific file or data matches what he or she is looking for.Finally the search results returned are identified by file size and userspeed. This information additionally allows the searching client user toconfirm the file requested and select an optimal source for the transferof the file or data from the remote client user. The system and methodof the present invention provide an efficient and effective searchingprocess that requires a minimal and intuitive user interface.

The many features and advantages of the present invention are apparentfrom the detailed specification and figures and, thus, it is intended bythe appended claims to cover all such features and advantages of theinvention which fall within the true spirit and scope of the presentinvention. Furthermore, since numerous modifications and variations willreadily occur to those skilled in the art, it is not desired that thepresent invention be limited to the exact construction and operationillustrated and described herein. And, accordingly, all suitablemodifications and equivalents which may be resorted to are intended tofall within the scope of the claims. Although the invention has beendescribed with reference to specific embodiments, this description isnot meant to be construed in a limited sense. Various modifications ofthe disclosed embodiments, as well as alternative embodiments of theinventions will become apparent to persons skilled in the art upon thereference to the description of the invention. It is, therefore,contemplated that the appended claims will cover such modifications thatfall within the scope of the invention.

We claim:
 1. A method of indexing data with the assistance of a clientcomputer system, the method comprising: a. under control of a clientcomputer system i. gathering file information on data on said clientsystem, ii. conducting a Burrows-Wheeler Transform on said fileinformation, and iii. transmitting said transformed file information toa server system; and b. under control of said server computer system i.receiving said transformed file information, ii. creating the fileinformation from said transformed file information, iii. creating asearchable index of said transformed file information, and iv. storingsaid searchable index and file information on said server system wherebysaid searchable index is created for use on said server system usingsaid client system resources.
 2. The method of claim 1 furthercomprising: a. under control of a client computer system, compressingsaid transformed file information prior to transmission to said servercomputer system, and b. under control of said server computer system,decompressing said transformed file information after reception by saidserver computer system.
 3. The method of claim 1 further comprising: a.under control of said client computer system, i. before said step ofconducting a Burrows-Wheeler Transform on said file information,converting each upper case character in the original text of said fileinformation to lower case; ii. after said step of conducting aBurrows-Wheeler Transform on said file information, appending saidtransformed file information with a string of bytes wherein each bitrepresents the original case of a character of said original text ofsaid file information; and b. under control of said server computersystem, storing said string of bytes on said server system such thatrestoration of said original case of a character is enabled.
 4. A methodof indexing data with the assistance of a client computer system, themethod comprising: a. a client system signs on to a server system, b.said client system searches for relevant data and file information ofthat data which the user is willing to share and creates an originalstring of characters which contains file information such as file name,location, and size, c. said original string of characters is transformedusing the Burrows-Wheeler transformation method, d. during saidtransformation a rotation matrix is created of the original string ofcharacters and is compressed using a standard compressing method, e.said compressed data is transmitted to the server system f. said serversystem decompresses the data using the reciprocal of said standarddecompressing method g. said transformation of the file information isreversed to recover the original string of characters, h. whilerecovering the original string of characters, a suffix array is created,i. said original string of characters and said suffix array are storedin the memory of said server system, and j. a binary search can beconducted of the suffix array to locate any sub-string of characterswithin said original string of characters.
 5. The method of claim 4further comprising: a. a second client system signing on to said serversystem b. said second client system initiates a search of the memory ofaid server system for a selected sub-string of characters, c. once theselected sub-string of characters is found, said server system sendssaid second client system a list of the located relevant fileinformation, d. if the second user wants to obtain a copy of the data, amessage is sent directly between said second client system and saidfirst client system, and e. said requested data is transferred from saidfirst client system to said second client system.
 6. The method of claim5 further comprising a. each client system willing to share data whichis signed on to the server system will have said original string ofcharacters and said suffix array created for each said client system andstored in said server system memory while each said client system issigned on to said server system, and b. if a said client system signsoff said server system, that said client system's original string ofcharacters and suffix array are deleted from said server system wherebythis creation of a client system's original string of characters andsuffix array only while the client system is signed on the server systemenables the server system to have a dynamic and constantly updated indexof data, which is readily available for transfer between client systems.7. A method comprising: providing information describing at least onecomputer readable file operable to be made available by a first clientcomputer system to a second client computer system; transforming theinformation describing at least one computer readable file intotransformed information, wherein at least a portion of the transformedinformation is in a format suitable for compression and wherein thetransforming is reversible; transmitting the at least a portion of thetransformed information to a server computer system.
 8. The method ofclaim 7 wherein the information describing at least one computerreadable file further comprises a string of characters including atleast one of a file name, a file location, a file type, a file size, afile description, and information about an original case for at leastone of the string of characters.
 9. The method of claim 7 furthercomprising: compressing the at least a portion of the transformedinformation.
 10. The method of claim 7 wherein the transforming theinformation describing at least one computer readable file intotransformed information further comprises: transforming the informationdescribing at least one computer readable file according to aBurrows-Wheeler transform.
 11. The method of claim 7 wherein theinformation describing at least one computer readable file furthercomprises a string of characters and wherein the transforming theinformation describing at least one computer readable file intotransformed information further comprises: representing each of aplurality of suffixes formed from the information describing at leastone computer readable file with a respective pointer; sorting theplurality of suffixes using the respective pointers; and determining anarray of characters based on the sorting the plurality of suffixes,wherein at least one of the characters in the array of characters is aprefix corresponding to at least one of the plurality of suffixes. 12.The method of claim 7 wherein the information describing at least onecomputer readable file further comprises a string of characters, themethod further comprising: transmitting to the server computer system anindex value indicating a position within the at least a portion of thetransformed information of a first character of the string ofcharacters.
 13. The method of claim 7 encoded in a computer readablemedium as instructions executable on a computer system.
 14. A methodcomprising: receiving transformed information, wherein the transformedinformation is based on information describing at least one computerreadable file operable to be made available by a first client computersystem to a second client computer system; transforming the transformedinformation to recover the information describing at least one computerreadable file; and determining a plurality of suffix array values duringthe transforming the transformed information, the plurality of suffixarray values indicating a sorted order of a corresponding plurality ofsuffixes present within the information describing at least one computerreadable file.
 15. The method of claim 14 wherein: the informationdescribing at least one computer readable file further comprises astring of characters; the transforming the transformed informationfurther comprises analyzing a transformation vector and the transformedinformation to determine an original order of a plurality of charactersfrom the information, and the determining a plurality of suffix arrayvalues further comprises for at least one of the plurality of charactersfor which an original order is determined, assigning to a unique suffixarray position a value corresponding to a position of the at least oneof the plurality of characters in the original order from theinformation.
 16. The method of claim 15 wherein the unique suffix arrayposition is determined by from the transformation vector.
 17. The methodof claim 14 wherein the transforming the transformed information torecover the information describing at least one computer readable filefurther comprises: transforming the transformed information according toa Burrows-Wheeler transform.
 18. The method of claim 14 furthercomprising: receiving a search string; and using the suffix array todetermine if the search string is present in the information describingat least one computer readable file.
 19. The method of claim 14 furthercomprising: decompressing the transformed information.
 20. The method ofclaim 14 encoded in a computer readable medium as instructionsexecutable on a computer system.